-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEP-0104]: Support Task-level resource limits #703
Conversation
/assign @vdemeester @jerop |
added checks for the update (with limit) as this thread |
@@ -118,63 +118,69 @@ spec: | |||
resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a check for the augment on the existing Task.spec.resources
field which contains inputs
and outputs
for TaskResources
thinking perhaps the context will be a bit diff - as inputs/outputs
for running the actual build task while the added limits/requests
for running the beneath Pod. but after checking other existing resources
fields, like Task.Steps.*
etc, think still should be a better choice to keep the same field name and add the limits
and requests
, as for a consistent naming on the same funcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unfortunately resources
is an overloaded word here. However, I agree that it makes sense to use it in this context, because it is the terminology Kubernetes uses to describe compute resources, and luckily PipelineResources will be going away which should lessen the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...that's unfortunate...would the limits and requests fields be added under the current resources
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see the problem, this is definitely annoying but I'm hoping naming bikeshedding won't block this PR-- maybe we could call it "compute" or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an updates from implementation... tentatively I keep the added requests
and limits
under the mentioned resources
field for this moment
https://github.com/tektoncd/pipeline/pull/4877/files?diff=unified&w=0#diff-f67e6d3007cea84a7a2e6301beb54c39b96b75ac9e57d888cbf5cc86c57510f3R77
thinking these reasons:
- a consistent naming for (container computing) resources requirements as being used in all other
resources
field, likeTask.spec.step.resources
,Task.spec.stepTemplate.resources
- comments on code level and docs updates can be done to clear the minor(but meaningful) context diffs between resources for building steps and container computing resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely less than ideal but I think it's ok given that we'll be removing pipelineresources (and better than the alternatives). thanks for the update Austin!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: we discussed this briefly during last week API working group, and even though the resources
field (PipelineResource) is going away, re-using the same field name is going to create a lot of confusion. We may have to use a more explicit field here even (resource_requirement
or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah I think this needs more discussion. How would you feel about merging this PR discussing behavior changes, and I'll open a subsequent one specifically for naming? I added a note here mentioning the name conflict with pipelineresources.
/assign @dibyom |
@@ -118,63 +118,69 @@ spec: | |||
resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...that's unfortunate...would the limits and requests fields be added under the current resources
field?
Therefore, this proposal will focus only on task-level resource requests, not limits. | ||
However, the effective resource limit of a pod are not used for scheduling (see | ||
[How Pods with resource requests are scheduled][scheduling] and [How Kubernetes applies resource requests and limits][enforcement]). | ||
Instead, container limits are enforced by the container runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: What happens when the container runtime enforces the limit? Do we currently handle that error somehow? Or does the taskrun just time out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on the implementation, perhaps can help with some my understanding...
task-level limits will be applied on each step, so each step/container gets enforced with the required limits. Although the task/pod-level will have a larger summed up limits amount(as from each step), the container runtime still follows the limits over the whole process -- due to the sequentially executed steps under Tekton context.
some lines I updated in docs, perhaps help a bit also
https://github.com/tektoncd/pipeline/pull/4877/files?diff=unified&w=0#diff-5007e9db17cf9b70b930577bd627581e10e754347e92e749f5d58614456d7643R82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
over-limits (task-level) will be checked in a helper func under reconcile logic and throw an error to fail TaskRun
an example like the current over-limits checks for step-level as
validateTaskSpecRequestResources()
also the checks implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in tektoncd/pipeline#4930
/kind tep |
@vdemeester @jerop please take a look when you have the time! (You can also unassign yourselves if you want :) ) |
We should consider deprecating `Task.Step.Resources`, `Task.StepTemplate.Resources`, and `TaskRun.StepOverrides`. | ||
Specifying resource requirements for individual Steps is confusing and likely too granular for many CI/CD workflows. | ||
|
||
We could also consider support for both Task-level and Step-level resource requirements if the requirements are for different types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@austinzhao-go updated the tep to clarify
We could also consider support for both Task-level and Step-level resource requirements if the requirements are for different types
of compute resources (for example, specifying CPU request at the Step level and memory request at the Task level). However,
this functionality will not be supported by the initial implementation of this proposal; it can be added later if desired.
API WG - @vdemeester looking for approval please! |
@@ -118,63 +118,69 @@ spec: | |||
resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: we discussed this briefly during last week API working group, and even though the resources
field (PipelineResource) is going away, re-using the same field name is going to create a lot of confusion. We may have to use a more explicit field here even (resource_requirement
or something)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dibyom, jerop, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
After further reflection on this proposal, I realized pod effective resource limits are not used for scheduling, and limits are enforced on individual containers by the container runtime. This commit updates TEP-0104 to support Task-level resource limits as a result. It also changes the behavior of how Task-level requirements interact with Step-level requirements and with Sidecars as a result. Lastly, it removes the ability to specify both Step-level requirements and Task-level requirements if the resource types are different, and moves this functionality to potential future work.
all assignees agreed to merge this offline - so merging now - cc @dibyom @vdemeester /lgtm |
After further reflection on this proposal, I realized pod effective resource limits
are not used for scheduling, and limits are enforced on individual containers by the
container runtime. This commit updates TEP-0104 to support Task-level resource limits
as a result. It also changes the behavior of how Task-level requirements interact with
Step-level requirements and with Sidecars as a result.